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introduction 

The following discussion introduces the framework for the new 
storage system data recovery design.- Specifications for the new 
storage system were given in NIT3-110. This bulletin is concerned 
with the part of the data recovery task currently termed 
"salvaging. " (The remaining part is backup and retrieval.) Data 
recovery mechanisms exist because of imperfections, both in 
hardware and software. The reason for a salvager redesign is to 
increase two important Multics attributes, availability and 
reliability. Availability implies that stored data should always 
be dynamically accessible at the demand of any user, while 
reliability implies no loss of store^ data as well as tne safe 
storage of the security information used to protect tne stored 
d a t a . 

This HIS proposes a major change to today's salvaging operation. 
To accommodate storage growth, salvaging will becone dynamic and 
distributed. -lore of the errors corrected by tne salvager will 
become user visible. An implementation olan which chroni cali zes 
the design decisions still to be made is given in tne companion 
i T 3-22 1 "i^ew Storage System Salvager Implementation." In order 
to explain why this MTS's design is being proposed, some 
background material is presented first. 

Storage System Overview 

As a first order approximation, the Multics storage system can be 
viewed as a logical organization for an array of file maos. This 
logical organization can be oroxen into two parts: directory 
control and storage control. Directory control nandles the 
logical structuring of tne user data and stores the security 
information. A directory consists of oojects (branches, names, 
acls, etc.) whose data is held in structures. Relations among the 
oojects are implemented by tnreading the structures together. 
Storage control manages tne file map arrays. A file map is 
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notninT more than an array of physically sequenced keys to the 
stored data. 

Stored data can be "found" oy only one method: logically 
traversing through a hierarchial structure of directories. Since 
directories nave internal structure, a successful logical 
traverse requires a physically -correct internal structure, errors 
nave been caused oy human, probabilistic (hardware), and even 
cosmic (unknown, such as lightning causing a power outage) 
actions. Seeause only the human cause of these errors can 
( tneoret ically ) oe eliminated, error detection and recovery are 
necessary. The nechanism used for this purpose currently is the 
offline salvager. The system is crashed upon detect ion of an 
error and correction is achieved oy running tae salvager y.t.ius 
naming tne systei unavailaole for useful work). Sue a operation 
is necessary today, but its cost is too high for the service 
provided, since most of the directories salvaged have no errors. 



tfew Storage System Structure 



Tne new Storage System ( J S3) splits current directory oranehes 
into two parts, the logical attributes and the physical 
attributes. The pnysical attributes are stored in a 
self-consistent format, the volume table of contents entry 
(vtoce) which contains a uid pathname. The connection between a 
:iSS directory branch and a vtoce is logical in both directions. 
Tnis design is inherently nore costly to process for the salvager 
as well as for the storage system because two disk references, 
one for the prancn, the other for the vtoce, are often necessar/ 
where one sufficed previously. 

Projecting the present salvager's operation into a .\IS5 format 
gives running time estimates of 5 hours for a 133 disk drive 
system. Performing the same operation with a multi-process 
salvager could cut this time down by 1/2 to 1/10 depending en the 
hardware configuration. Unfortunately, the near future capacity 
doubling of the :1S J3 430s makes even the multi-process approach 
u n a c c e o t a p 1 e . 

Current directory control is coded with the assumption tnat 
threads and relative pointers are always valid. Thus a prief 
description of the salvager's action would be that its primar/ 
purpose is only to prevent faults on thread and relative pointer 
references. A walkthrough of tne salvager code reveals tnat 
directory control relies on few of the other parts of a directory 
object's structure. Other benefits derived from salvaging are 
garbage collection (directory compaction and tne freeing of space 
used by prpcess directories and nardcore segments;, and quota 
verification. Some cross-checking on acl structures and 
access-class relationships is done in an attempt to establish 
securitv non-compromise. 



Trie .salvager also checks for reused addresses by recording all 
cage assignments in a new free storage map which replaces the old 
one at the end of salvaging. This task nas been split oat of tne 
directory salvager by the -j S 3 design, since every volume (disk 
d a c k ) now contains its own map. k salvage operation over a volume 
will oe performed oy a volume salvager, to be described later. 

Terminology 

Before proceeding any further, definitions must oe given for the 
terms used. The term "salvaging 11 is misleading in its innate 
description of the current code's function, since an operational 
desoriotion is mostly "directory enecking" witn correction 
occurring infrequently. When "salvager" is used it it will refer 
to today's ooeration. ?or tne proposed design, compound terms 
will oe used to more clearly indicate which operations are being 
discussed . 

"Directory checking" is defined as that code whicn detects errors 
in directories. "Directory salvager" defines that code wnicn 
corrects and comoacts directories. "Connection onecxing" refers 
to that code which checks branch-vtoce connections. "Volume 
salvager" refers to that code which performs garoage collection 
on a volume. 

Soeci f ications for the Directory Salvager 

The directory salvager is first viewed as a clack box with 
inout, output, and environmental specifications. The following 
describes the input and output constraints: 

1. Tne inout is a bit strin? and some (read only) context 
oredicates. 

2. The cutout is a valid directory (tnis includes a njii 
directory ) . 

3. jiven a valid directory as input, the cutout is tne sane 
valid directory. This can oe called the non-destruction 
rul e. 

a. Optionally, given a valid directory and a length as 
inout, the output is a valid directory which includes 
the incut valid directory as a subset. Valid objects 
that exist within the input length will be connected to 
tne aopropriate places. This is the reclaimation 
option. 

4. If an invalid directory object is found, tnen it will be 
changed to oe valid only if no security compromise can 
occur. If it is changed then a possible loss of correctness 
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will oe indicated. Otherwise, it will be discarded. This 
is the object acceptance criterion and the inverse is the 
discard criterion. 

A few words about the use of the words "validity" and 
"correctness" are in order. An object is correct if its data has 
not changed by any means other tnan by user calls to storage 
systen entry points that are provided specifically to change the 
data. (Correct data is simoly data that has not been clobbered 
by the system.) Only certain correctness lossss are detectable. 

An object is valid if its structure conforms to the rules that 
are implicitly given py the storage system implementation. \ 
mininU'Ti set of validity rules is defined py a particular 
i nol snient at ion. The directory salvager can, of course, validate 
all possible structural parts, guaranteeing validity irrespective 
of any storage implementation changes. This extra checxin* 
guarantees that all errors ( cloboerings ) which span potn data 
and structure will, if possible, be detected. Tnus the 
orobaoility of detecting correctness losses is increased. 

Design Basics 

Clearly, it is only necessary to rePuild directories that nave 
errors in them. All other repuilding is wasteful and adds to tne 
cost of tne service. If tne goal of service continuity is put 
aside for the moment, it would be acceptable to rebuild only tne 
one direectory that caused a crash. In the current salvager, 
consideration of reliability and garbage collection had nade us 
willing to spend tne processing time required to salvage all 
directories, in tne belief that otner inconsistencies Tiight exist 
and would cause crashes snortly into the next boot load. ,Je 
cannot afford such action on larger hierarchies because salvaging 
time increases linearly with the size of the hierarenv. 

A deeoer look into the use of the current salvager at external 
sites reveals another purppse, that of restoring ths confidence 
level in an "intact" hierarchy back to 10CU. (Here "intact" is 
used to convey the ideas of correctness and validity.) 

It is proposed that directory structure checking peocne an 
integral feature of the online operation of directory control and 
tnat the notion of a separate salvager subsystem oe dropped. 
Error detection will oe done dy nani cal ly , and corrections will be 
done by online rebuilding. Dynamic checking can be visualized as 
tne preaki ng up of the current salvager into two parts, 
scattering tne checking function throughout directory control, 
and retaining the directory rebuild function. A salvage of the 
entire hierarchy will still oe possible, but will oe rarely used. 

Tne economics of dynamic checking indicate that it will oe more 
expensive than today's offline strategy on staple nard.vare 
configurations with small disk complements. Part of this cost can 
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be written off as that necessary for utility operation; i.e. with 
dynamic checking and online rebuilding, tne mean tine between 
failures should increase, and down time minimized to that 
necessary to repair failures not caused by tne storage system. 
Also the checks are applied in direct proportion to the activity 
of a directory; a quiescent directory is not checked. A 
significant cost reduction will be made oy altering structures to 
decrease checking time and to increase error detection 
probability. Since these costs can only dc given in ballpark 
figures today, part of the design process will oe to measure the 
actual costs on a 1 S3 system before deciding wnat checking is 
viable. 

Envi ronment 

Ine environment of the directory salvager is considered next. As 
is true for tne offline salvager, the directory salvager relies 
on correctly functioning lower level machines. Both today's 
salvager and the d33 directory salvager assume that hardware, 
page control, and the syserr mechanism work. In addition, the 
directory salvager will assume tnat directory locking is also 
functioning. These assunptions can be made safely as long as 
errors from lower level machines are either Drocessed in the 
lower level machine or are random, A ran iom error distribution 
giarantees that the directory salvager will eventually run during 
a tine Period when no errors occur, and therefore will return a 
vaii d 1i rectory . 

As insurance against non-random errors that are not detected by 
the directory salvager, a small array of invocation times and 
errors found will be keot in every directory header. If the 
directory salvager is invoked too frequently, it will inform tne 
operator that a oossiole loop exists. A review of the errors 
found should help in determining whether hardware or software is 
susoect . 

The ability for a boot to always get to command level is an 
important factor in the confidence level in dultics. Tne offline 
salvager's contribution here was to guarantee structurally' valid 
libraries. The equivalent confidence in the new Storage System 
will be achieved as follows: Part of every boot will be to run 
the volume salvager over the root's physical volume and to 
directory salvage tne root, > system__control_1 , and any other 
important svstem libraries. Tne inclusion of salvaging as an 
integral part of ilultics boot relies on a nardcore oartition as 
proposed in nT3-213. If the answering service cannot oe started, 
a reload of the primary system lioraries could oe performed. 
Once command level is reached, every site is at lioerty to 
specify more checking in its startup sequence. 
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Distributed Checking 

So far the directory salvager aas been viewed as a black box. The 
following section descrioes the soecific checks and structure 
changes that are proposed. 

At the end of each structure a checksum field and an owner field 
will be added. The owner field will contain the directory uid 
for directory threaded objects, and the entry uid for entry 
threaded objects. A length field and an object type field will be 
added to the front of eacn object. Saeh threaded list will nave 
a unique count of the numoer of members in the list. Since this 
is already trus for all lists except initial acls, the initial 
acl total count will be replaced by an array of individual list 
counts. 

Therefore all directory objects will have the following format : 



Based on directory structure statistics at -HI, the prooosed 
structure modi f icat ions would increase an average directory by 

$ "jo « 

The following cheeks can oe made oy directory control. Sacn 
check can be made i ndeoendent ly of the others; thus the final 
installation will have tne most viable combination, as determined 
by c o s t / oerformanee studies outlined in MTB-221. 

1. Change all storage system procedures that calculate relative 
addresses to check if the address is 0 before using it. In 
most cases this can oe done by adding one instruction when 
oicxin?; uo relative oointers. ?or prooer operation, the end 
of a valid thread will be some value ot ner than 3. 

2. All directory objects will have a type id in their 
structures. All references to a thread oointer or to the 
item itself, will first check for the correct type value. 
This cneck will probably add three instructions to every 
reference. In a similar manner, one length and owner fields 
can be checked. 

3 . For all directory objects, templates are forned which test 
all the constant bits. ?or Aid I data this would translate 
to testing the first two oits of each character to be 0. Por 
directory headers md entries this would translate to 
testing that all pad fields remained J. These checks can be 
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imolemented via extended instructions and a test. Template 
checking will first be used by trie directory salvager wnen 
r e ou i 1 d i n g a direct ory . 

4. finally, each ooject will nave a c necks una stored along witn 
its data. A checksum calculation would take as many 
instructions as the length of the object plus two more for a 
comparison and transfer. The cost of storing a checksum v/nen 
an object is created or changed is negligible if we assume 
tnat the nuTioer of directory references is much greater than 
the number of directory modif ications; thus it snould be 
calculated along with each modification and used by the 
directory salvager. Checksums will De calculated for only 
the relatively constant data fields in a structure, not for 
items such as a ate- time-used. 

An easily implemented installation option would be to template 
and/or checksum all access names during an access mode 
calculation. Aeces3_mode already references exception bits in the 
ods and thus would require only two or three extra instructions 
to check anotner exception bit. 

ACL Srrors 

•4hen an acl error occurs, the current directory strategy of 
snaring access names creates or ob leas in retaining any 
information from the valid acl entries. All acls that share a 
bad access name must oe deleted as no secure method exists for 
orot eeting tne integrity of an acl. It is proposed that an acl 
out-of-service condition be supported by the storage system. 
4 3 0-1 a cement of tne acl would be required to turn on service. 3ut 
tne name sharing strategy has often produced many (if not all) 
invalidated acls in a directory. Thus multiole corrections for 
one error are required. If acl errors are frequent enough in M3S 
then sharing of access names should oe dropoed. This change 
would also nave the beneficial effect of localizing all oranch 
attributes, thus reducing page faults. Snaring within a brancn 
would still oe su poor ted. 

Tne cost of not sharing acl names is ratner high, witn an average 
increase in directory size of 45 & ♦ Even if tne savings obtained 
from reduced page faults (due to the localization of the acl) are 
included, tne sui will still show a cost increase. Recovery of 
the cost increase could be accomplisned by implementing variaoie 
size nash taoles. 

Directory Control Cnanges 

As well as type and owner field checking, certain bounds and 
cross checks of structure values will be added to directory 
control when it is in the explicit interest of a orocedure to 
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decrease its gullibility. Several cheeks of this kind already 
exist; for example when acls are listed, t he nu Tiber found in the 
acl list is compared to the count in the oranca. 

Currently the count of sharers xeot with each acl nam-e- is used to 
allow deletion of the name. If the count was incorrect, then 
reassignment of the name slot would change a person or oroject 
name on several acls. ?or this reason, it is proposed thit access 
names not be freed until a rebuild is performed. A reference to a 
name witn a zero sharing count will be one more form of error 
detection. 

Directory Allocation 

A simplification to the directory allocation scheme is proposed. 
Instead of maintaining several different fixed size free lists, 
all allocation requests will be placed at the end of a director/. 
Slots that are freed will be zeroed and not reused. A total of 
the freed space will oe kept so that the necessity of compaction 
can be determined. This strategy nas the desired effect of 
isolating the introduction of errors. ?or example, a ne^ branch 
will have its names and acls onysically as well as logically 
attached. In case of modifications (deletions and additions), not 
reusing the freed slots allows the detection of cross threading 
errors, something the current salvager does not check. Tans we 
are introducing segregation in an attemot to lessen the oecurance 
of errors that spill over (affect more tnan one branch). 

mJow that allocation can accept any reasonable size request , linxs 
can be stored more eonoaetly. Also the introduction of new 
objects into a directory need not consider the current block size 
limitations. Changing sizes of current structures is also 
facilitated. Inollcit in this suggestion is that directory space 
management will be done inline by directory control oecause it is 
so simple. If variable size allocation is adapted, then the 
proposed new directory structures would only increase an average 
directory's size by 3% rather than 5&. If variable size nasn 
tables are implemented, then a net size decrease of 39& could :>e 
achieved. 

Triggering Rebuilds 

dnenever the freed soace total and the count of directory 
attribute modifications exceeds some threshold, the directory 
salvager will be invoked to perform a rebuild. Here we have 
achieved the desired orooerty that the more a directory is 
modified, the more often it is validated. It is not necessary to 
count directory read ooerations because the new storage system 
design does not require any directory modifications in order to 
read (search) directories. 
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For dynamically detected errors, tne mecnanism used to trigger 
the directory salvager is the following: whenever a directory is 
locked, a handler for the "invalided! rector/" condition is set up 
to call the directory salvager. After trie reouilding, control is 
transferred to the statement following the lock call, thus 
repeating the function on the rebuilt directory. Internal 
directory control procedures need only signal whenever an error 
is found. All procedures whicn lock directories must be checked 
for coda which will operate properly .when restarted after the 
looic call - for instance, variables assigned before the loc^c call 
cannot be reassigned after the lock call. 

Srror Reporting 

Tne methods used today for reporting errors detected by the 

salvager are inadaquate. Of significant concern to users are 

missing branches, bad names, and lost acls. Althougn the 

salvager prints all detected errors, no method exists for 
distributing these messages or issuing warnings. There is also a 
problem in deciding who should receive the messages. 

Both the directory salvager and the volume salvager will use the 
syserr mechanism for recording errors, as the syserr log is the 
permanent record of system events (especially detected failings). 
The log can be processed online in order to detect error patterns 
and maybe even predict hardware failures. To reflect errors to 
users, flags will be set. Sad names and missing branch flags 
will oe set in the directory neader wnile an invalid acl flag 
will oe set in the branch. The current action of deleting 
invalid acls will be changed to retain tne acl for listing 
purposes. Directory control will treat the invalid acl flag as if 
tne acl was null (apl out-of-servic a) . Tne invalid acl flag can 
be turned off by either deleting the entire acl or replacing it. 
>Ao action for missing names and branches is currently planned, as 
these are relevant to Multics search rules and could affect every 
process, for example, if a name was Tiissing in >sss tnen every 
process might be stopped until the flag was reset. In tne future 
suen errors could become visiole oy cnanging the searcn of suen 
directories to signal some condition. The default action would 
be to ignore tnis signal. 

Storage Control 

Storage control errors are nandled by a volume salvager. The 
input and out out soeci fications for tne volume salvager are as 
f ol lows : 

1 . The inout is the string of bits that comprise a volume. 

2. The output is a valid new Storage Systen format volume. 



3. Given a valid storage volume, the same storage volume is 
returned . 
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4. If an invalid vtoce is found, it will be deleted. If a 

reused address is found mi the vtoce appears to be a 
directory, the volume salvager will invoke the directory 
salvager on this directory. If no errors are found, then 
the cage in question will be awarded to the directory and 
the vtoce set out-of-service. Turning on service to such a 
directory must be performed by an administrator. 

The volume salvager environment is also similar to the directory 
salvager's. Eaeh establishes exclusive control over its subjects 
(in this case a disk oaek). Also each relies on correctly 
functioning lower level mechanisms. ?"or the volume salvager this 
is disk i/o. In the final imol ementat ion, the volune salvager 
snould gain control of tne volume via 3CP. But for now, a direct 
path to the disk dim will oe used. 

Disk PacKS 

To aid in cnecking vtoces, their structure will be extended to be 
similar to that of directory objects. The vtoce checksum will 
cover only the aid p at nana -ie and the access-class, as other vtoce 
fields change too frequently. 

Since only a limited reused address ohecK can be made by page 
control (tne user of vtoces), volume checking would normally 
occur infrequently. Tnerefore triggering the volume salvager nas 
to oe accomplished artificially. One installation option would 
be to salvage at the time the disk is logically connected. Tnis 
might be judged too costly (1-3 min. per ,1300400), so that 
scheduled volume salvaging could be implemented for slack time 
periods . 

As well as salvaging all vtoces, the volume salvager will 
reconstruct tne volume mao and will check for reused addresses, A 
reused address involving a directory will be resolved by asking 
the directory salvager if any errors were founi in salvaging the 
oages that include the reused address data. A finding of no 
errors would result in awarding tne page to that directory. If 
errors were found, then the rebuilt version of the director'/ witn 
a zero oage would replace tne bad one, and a retrieval request 
for that directory issued. (A nore detailed descriotion of 
directory retrievals will be given in tne backuo 1T3.) A reused 
address on a segment would result in a null address award 
(equivalent to zeroing), an out-of-service indication, and a 
retrieval request. A second pass over tne volume will be made to 
handle any directories that were the first claimants of reused 
addresses. 

Branch - \ZTO0fi Connection 

Tne new storage system design includes the dynamic checking of 
the logical connection between tne branch and vtoce at activation 
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time. The resolution of an error at this time should be as 
follows: 

1. Check the vtoce checksum. If it is correct then na r< the 
'oranca as unconnected. A user encountering an unconnected 
branch can either delete it or issue a retrieval request for 
its vtoce. A future addition might oe to allow scanning of 
volumes for an unconnected vtoce entry with the 'Hatching uid 
and upon finding one, connecting t ne oranch to it, 

2. If the checHcsun is wrong, then mark the vtoce ou t-of-servi ce 
and issue a retrieval request for thai: vtoce. Tne user 
referencing the branch would receive the out-of-servi ce 
error im-.isdiately and could try again at some later time. 

One future extension should be mentioned. Whenever a directory 
retrieval is performed, instead of replacing the contents in 
toto, the version from backup and the existing version could be 
logically coalesced. This would prevent loss of new branches.. In 
any case, notice that although a retrieval can return already 
deleted branches, the correct action is taken at activation wnen 
a connection mismatch is detected. 

Loops 

In tne effort to preserve all possible information, we have 
chosen not to delete objects but to mar* them as having errors, 
and allowing users to issue retrievals. Unfortunately, tnere is 
no guarantee that the retrieved information is correct - in fact 
it may nave the same error. This, is a loop which only a user can 
detect. The resolution is that, if necessary, a previous 2opy 
retrieval sriould oe tried, ad infinitum. 

An apparent loop also exists in the specification of reused 
address processing. Assume that the volume salvager detects a 
reused address when processing a directory vtoce. It as^s the 
directory salvager for some advice. But the directory salvager, 
in formulating its opinion, can get a reused address signaled 
from page control, and this would invoke tne volume salvager! 
Inis sequence is prevented from happening if we insure that all 
addresses in a particular directory are unique (done oy tne 
volume salvager) and that the volume salvager has exclusive 
control of the disK oac< (thus page control cannot signal a 
reused address on it). 

Loose inds 

Purposely saved until the end is the subject of quota validation. 
The elimination of offline salvaging implicitly drooped this 
function, since it can only be done on quiescent suotrees. It 
could be performed online only if there was a guarantee that this 
was the only process looking at the subtree. One approach to 
achieve this would be to turn on security ou t-of-servi ce for all 
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components in tne subtree* Jnee we assume or take action to 
provide exclusiveness, a procedure which sets tne used values in 
an aste must be provided. It is proposed that quota validation 
become a oart of tne administrative mechanisms used in 
determining volume usage charges. 

While on the subject of charges, notice that the directory 
control checking design has transformed the collective aggregate 
cost of offline salvaging into a process assigned "pay as you 
use" part of the storage system. For physical volumes that are 
wholly owned by projects, even the use of the volume salvager as 
the garbage collection device is automatically charged to tne 
correct project. 



1. The salvager is split into three oarts: a directory salvager 
which rebuilds directories, scattered checking in directory 
control, and a volume salvager whien checks for reused 
addresses and rebuilds tne volume map. 

2. Detected errors are entered in the syserr log, and users are 
notified of errors by out-of-service conditions and error 
bits in the directory header. 

3. Directory structures are exoanded to oe more rooust and the 
directory allocation scheme is changed to ta<e advantage of 
the directory salvager. The costs for an average directory 
are as follows: 



Sum nary 



structure changes - 
variaole allocation - 
non-shared access names 
variable size hash table 



-30; 



(end) 



